Subspace MOA: Subspace Stream Clustering Evaluation Using the MOA Framework

نویسندگان

  • Marwan Hassani
  • Yunsu Kim
  • Thomas Seidl
چکیده

Most available static data are becoming more and more highdimensional. Therefore, subspace clustering, which aims at finding clusters not only within the full dimension but also within subgroups of dimensions, has gained a significant importance. Recently, OpenSubspace framework was proposed to evaluate and explorate subspace clustering algorithms in WEKA with a rich body of most state of the art subspace clustering algorithms and measures. Parallel to it, MOA (Massive Online Analysis) framework was developed also above WEKA to provide algorithms and evaluation methods for mining tasks on evolving data streams over the full space only. Similar to static data, most streaming data sources are becoming highdimensional, and tracking their evolving clusters is also becoming important and challenging. In this demonstrator, we present, to the best of our knowledge, the first subspace clustering evaluation framework over data streams called Subspace MOA. Our demonstrator follows the onlineoffline model which is used in most data stream clustering algorithms. In the online phase, users have the possibility to select one of seven most famous summarization techniques to form the microclusters. In the offline phase, one of ten subspace clustering algorithms can be selected. The framework is supported with a subspace stream generator, a visualization interface to present the evolving clusters over different subspaces, and various subspace clustering evaluation measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Evaluation Measures for Subspace Clustering of Data Streams

Nowadays, most streaming data sources are becoming highdimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other ...

متن کامل

Benchmarking Stream Clustering Algorithms within the MOA Framework

In today’s applications, massive, evolving data streams are ubiquitous. To gain useful information from this data, real time clustering analysis for streams is needed. A multitude of stream clustering algorithms were introduced. However, assessing the effectiveness of such an algorithm is challenging, because up to now there is no tool that allows a direct comparison of these algorithms. We pre...

متن کامل

MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering

In today’s applications, massive, evolving data streams are ubiquitous. Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problems of scaling up the implementation of state of the art algorithms to real world dataset sizes and of making algorithm...

متن کامل

Stream Data Mining: Platforms, Algorithms, Performance Evaluators and Research Trends

Streaming data are potentially infinite sequence of incoming data at very high speed and may evolve over the time. This causes several challenges in mining large scale high speed data streams in real time. Hence, this field has gained a lot of attention of researchers in previous years. This paper discusses various challenges associated with mining such data streams. Several available stream da...

متن کامل

Evaluation of Monte Carlo Subspace Clustering with OpenSubspace

We present the results of a thorough evaluation of the subspace clustering algorithm SEPC using the OpenSubspace framework. We show that SEPC outperforms competing projected and subspace clustering algorithms on synthetic and some real world data sets. We also show that SEPC can be used to effectively discover clusters with overlapping objects (i.e., subspace clustering).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013